Quantitative Counting with Bayes Theorem

نویسنده

  • P. Tar
چکیده

Given a dataset generated by various processes, an estimate of the quantity of data points associated with each generation process can be achieved using a probabilistic count of evidence using Bayes Theorem. However, the usefulness of these counts is limited by the presence of uncertainties stemming from the determination of quantities using maximum likelihood and the imprecision of density estimates. Systematic effects are also present in practice due to a lack of agreement between modelled data densities and actual data. This document follows on from Tina memo 2010-008 and shows how predictable sources of variation can be computed and how systematic effects can be minimised by using mixtures of density estimates to improve model-data agreement. 1 Quantitative counting Standard classifiers (Support Vector Machines, Random Forests, Boosting etc.) can be used to give decisive class labels to incoming data points. A discrete count of resulting labels could form the basis of an estimate of the quantities of classes present. However, any ambiguity will inevitably result in classification mistakes and potentially bias results. An alternative method can weight the counting of data points by the probability that they belong to a given class, rather than forcing a hard decision. Assuming honest probabilities this method should correct for misclassifications thereby providing an unbiased estimate of quantities. Given a set of data generation processes, J = {1, 2, . . . , N}, the quantity of any given class occurring in a region can be estimated using Q(j) = ∑ d∈R P (j|Xd) where Xd is evidence observed at location d within region R of the dataset and P (j|Xd) is the probability class j was the source of the observation. The class probability can be found using Bayes Theorem P (j|X) = P (X|j)Q(j) ∑ l∈J P (X|l)Q(l) where P (X|j) is a density estimate for class j and Q(j) is the amount of class j instances present in the region. Maximum likelihood estimates for the values of Q(j) are those which maximise the probability modelled data densities account for the actual observations. These can be found by minimising F = ∑ d∈R − log ∑ l∈J P (Xd|l)Q(l) which can be achieved using the parameter update methods for Expectation Maximisation. The usefulness of the resulting quantities will be limited by their accuracy. To be used quantitatively an estimate of their uncertainty is needed. The remainder of this document addresses the issue of uncertainty and how it can be managed. 2 Calculating covariances Summary outputs presented to end users can only be meaningfully interpreted if they are accompanied by estimates of their errors, e.g. points on plots should be overlaid with error bars denoting clear confidence intervals. This is especially important if results are to be used in further calculations. A covariance matrix provides all necessary information for both interpretation and further processing therefore determining the covariance matrix of quantitative counts, Q(j), is essential. A covariance matrix for quantitative counting must take into consideration the most significant sources of uncertainty. These sources include independent contributions from the variance in maximum likelihood determination of quantities and the imprecision of density estimates. Elements of the covariance matrix are then given by Cij = covCRB(Q(i), Q(j)) + covDE(Q(i), Q(j)) where covCRB is the contribution from the likelihood, which can be estimated using the Cramer Rao Bound, and covDE is the contribution from imprecise density estimates, which can be estimated using error propagation. Contributions from these two sources are derived separately in the following two sections. 2.1 Uncertainty in counting The contribution to the uncertainty of counts due to their maximum likelihood determination can be estimated using the Cramer Rao lower variance bound ∂F ∂Q(i)∂Q(j) ≥ 1 cov(Q(i), Q(j)) Computing the first and second order derivatives yields ∂F ∂Q(j) = − ∑ d∈R P (Xd|j) ∑ l∈J P (Xd|l)Q(l) ∂F ∂Q(i)∂Q(j) = ∑ d∈R P (Xd|i)P (Xd|j) [ ∑ l∈J P (Xd|l)Q(l)] The similarity with Bayes Theorem allows the second derivative to be formulated as ∂F ∂Q(i)∂Q(j) = 1 Q(i)Q(j) ∑ d∈R P (i|Xd)P (j|Xd) So that the final inverse covariance estimate is C−1 ij ≈ ∑ d∈R P (i|Xd)P (j|Xd) Q(i)Q(j) from which the covariance matrix can be computed using a standard matrix inversion algorithm. The covariance contribution from this lower bound shall be referred to as covCRB(Q(i), Q(j)). The next section will derive covariance contributions due to statistical perturbations in density estimates. 2.2 Uncertainty in density estimation Uncertainty due to imprecision in density estimation can be tracked using error propagation. The characteristics of these errors will be dependent upon the type of density models used. The least sophisticated method of density estimation is that based upon histograms. Each bin, i.e. each pattern X, within a histogram based density will be subject to independent Poisson errors. The following contribution to the covariance matrix assumes the use of such histograms. Patterns are likely to occur multiple times within a dataset with each occurrence of a single pattern being subject to the same error. Because of this correlation it is better to formulate quantities in terms of unique patterns rather than data points giving Q(j) = ∑ X∈R P (X|j)Q(j) P (X|j)Q(j) + P (X|j̄)Q(j̄) Q(X) where Q(X) is the total quantity of pattern X appearing within region R and the denominator in Bayes Theorem has been separated into independent error components where j̄ represents all classifications other than j. The covariance is then given by

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification using Hierarchical Näıve Bayes models

Classification problems have a long history in the machine learning literature. One of the simplest, and yet most consistently well performing set of classifiers is the Näıve Bayes models. However, an inherent problem with these classifiers is the assumption that all attributes used to describe an instance are conditionally independent given the class of that instance. When this assumption is v...

متن کامل

AALBORG UNIVERSITY Classification using Hierachical Näıve

Classification problems have a long history in the machine learning literature. One of the simplest, and yet most consistently well performing set of classifiers is the Näıve Bayes models. However, an inherent problem with these classifiers is the assumption that all attributes used to describe an instance are conditionally independent given the class of that instance. When this assumption is v...

متن کامل

Extremal and Probabilistic Combinatorics

09:20 – 09:30 Opening remarks 09:30 – 10:00 Dhruv Mubayi Quasirandom hypergraphs 10:05 – 10:35 Yufei Zhao Sparse regularity and counting in pseudorandom graphs 10:35 – 11:15 Coffee 11:15 – 11:45 Asaf Shapira Exact bounds for some hypergraph saturation problems 11:50 – 12:20 Po-Shen Loh Computing with voting trees 12:20 – 14:30 Lunch 14:30 – 15:00 Joel Spencer Six standard deviations still suffi...

متن کامل

An Lp-Lq-version Of Morgan's Theorem For The Generalized Fourier Transform Associated with a Dunkl Type Operator

The aim of this paper is to prove new quantitative uncertainty principle for the generalized Fourier transform connected with a Dunkl type operator on the real line. More precisely we prove An Lp-Lq-version of Morgan's theorem.

متن کامل

Fractional Helly theorem for the diameter of convex sets

We provide a new quantitative version of Helly’s theorem: there exists an absolute constant α > 1 with the following property: if {Pi : i ∈ I} is a finite family of convex bodies in R with int (⋂ i∈I Pi ) 6= ∅, then there exist z ∈ R, s 6 αn and i1, . . . is ∈ I such that z + Pi1 ∩ · · · ∩ Pis ⊆ cn 3/2 ( z + ⋂

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011